Optimized Parallel Prefix Sum Algorithm on Optoelectronic Biswapped-Torus Architecture

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Architecture Description and Prototype Demonstration of Optoelectronic Parallel-Matching Architecture

We propose an optoelectronic parallel-matching architecture (PMA) that provides powerful processing capability for distributed algorithms comparing with traditional parallel computing architectures. The PMA is composed of a parallel-matching (PM) module and multiple processing elements (PE's). The PM module is implemented by a large-fan-out free-space optical interconnection and parallel-matchi...

متن کامل

Improved Parallel Prefix Algorithm on OTIS-Mesh of Trees

A parallel algorithm for prefix computation reported recently on interconnection network called OTIS-Mesh Of Trees[4]. Using n4 processors, algorithm shown to run in 13log n + O(1) electronic moves and 2 optical moves for n4 data points. In this paper we present new and improved parallel algorithm for prefix on OTIS-Mesh of Trees. The algorithm requires 10log n + O(1) electronic steps + 1 optic...

متن کامل

A Parallel Matrix Inversion Algorithm on Torus with Adaptive Pivoting

This paper presents a parallel algorithm for matrix inversion on a torus interconnected MIMD-MC multi-processor. This method is faster than the parallel implementations of other widely used methods namely Gauss-Jordan, Gauss-Seidal or LU decomposition based inversion. This new algorithm also introduces a novel technique, called adaptive pivoting, for solving the zero pivot problem at no cost. O...

متن کامل

An Efficient VLSI Architecture Parallel Prefix Counting With Domino Logic

We propose an efficient reconfigurable parallel prefix counting network based on the recently-proposed technique of shift switching with domino logic, where the charge/discharge signals propagate along the switch chain producing semaphores results in a network that is fast and highly hardware-compact. The proposed architecture for prefix counting N 1 bits features a total delay of (4 logN +pN 2...

متن کامل

Parallel Prefix Scan with Compute Unified Device Architecture (cuda)

Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and procee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Vietnam Journal of Computer Science

سال: 2020

ISSN: 2196-8888,2196-8896

DOI: 10.1142/s2196888821500159